Overview

Brought to you by YData

Dataset statistics

Number of variables29
Number of observations2139048
Missing cells18314504
Missing cells (%)29.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory473.3 MiB
Average record size in memory232.0 B

Variable types

DateTime2
Categorical6
Unsupported1
Numeric8
Text12

Alerts

CONTRIBUTING FACTOR VEHICLE 4 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 5High correlation
CONTRIBUTING FACTOR VEHICLE 5 is highly overall correlated with CONTRIBUTING FACTOR VEHICLE 4High correlation
NUMBER OF CYCLIST KILLED is highly overall correlated with NUMBER OF PEDESTRIANS KILLED and 1 other fieldsHigh correlation
NUMBER OF MOTORIST INJURED is highly overall correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly overall correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly overall correlated with NUMBER OF CYCLIST KILLED and 1 other fieldsHigh correlation
NUMBER OF PERSONS INJURED is highly overall correlated with NUMBER OF MOTORIST INJUREDHigh correlation
NUMBER OF PERSONS KILLED is highly overall correlated with NUMBER OF CYCLIST KILLED and 2 other fieldsHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly imbalanced (99.6%) Imbalance
NUMBER OF CYCLIST INJURED is highly imbalanced (92.0%) Imbalance
NUMBER OF CYCLIST KILLED is highly imbalanced (99.9%) Imbalance
CONTRIBUTING FACTOR VEHICLE 4 is highly imbalanced (90.9%) Imbalance
CONTRIBUTING FACTOR VEHICLE 5 is highly imbalanced (90.1%) Imbalance
BOROUGH has 664048 (31.0%) missing values Missing
ZIP CODE has 664310 (31.1%) missing values Missing
LATITUDE has 239440 (11.2%) missing values Missing
LONGITUDE has 239440 (11.2%) missing values Missing
LOCATION has 239440 (11.2%) missing values Missing
ON STREET NAME has 458746 (21.4%) missing values Missing
CROSS STREET NAME has 815476 (38.1%) missing values Missing
OFF STREET NAME has 1772675 (82.9%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 2 has 336447 (15.7%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 3 has 1985155 (92.8%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 4 has 2104055 (98.4%) missing values Missing
CONTRIBUTING FACTOR VEHICLE 5 has 2129500 (99.6%) missing values Missing
VEHICLE TYPE CODE 2 has 417714 (19.5%) missing values Missing
VEHICLE TYPE CODE 3 has 1990930 (93.1%) missing values Missing
VEHICLE TYPE CODE 4 has 2105307 (98.4%) missing values Missing
VEHICLE TYPE CODE 5 has 2129794 (99.6%) missing values Missing
LATITUDE is highly skewed (γ1 = -20.03202737) Skewed
NUMBER OF PERSONS KILLED is highly skewed (γ1 = 33.18090186) Skewed
NUMBER OF MOTORIST KILLED is highly skewed (γ1 = 53.52641947) Skewed
COLLISION_ID has unique values Unique
ZIP CODE is an unsupported type, check if it needs cleaning or further analysis Unsupported
NUMBER OF PERSONS INJURED has 1636505 (76.5%) zeros Zeros
NUMBER OF PERSONS KILLED has 2135856 (99.9%) zeros Zeros
NUMBER OF PEDESTRIANS INJURED has 2020475 (94.5%) zeros Zeros
NUMBER OF MOTORIST INJURED has 1819309 (85.1%) zeros Zeros
NUMBER OF MOTORIST KILLED has 2137793 (99.9%) zeros Zeros

Reproduction

Analysis started2024-12-04 17:58:22.154853
Analysis finished2024-12-04 18:00:52.508173
Duration2 minutes and 30.35 seconds
Software versionydata-profiling vv4.12.0
Download configurationconfig.json

Variables

Distinct4536
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size16.3 MiB
Minimum2012-07-01 00:00:00
Maximum2024-11-30 00:00:00
2024-12-04T13:00:52.590235image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:52.970231image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1440
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size16.3 MiB
Minimum2024-12-04 00:00:00
Maximum2024-12-04 23:59:00
2024-12-04T13:00:53.134242image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:53.277229image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

BOROUGH
Categorical

Missing 

Distinct5
Distinct (%)< 0.1%
Missing664048
Missing (%)31.0%
Memory size16.3 MiB
BROOKLYN
470551 
QUEENS
395650 
MANHATTAN
328674 
BRONX
218295 
STATEN ISLAND
61830 

Length

Max length13
Median length9
Mean length7.4519586
Min length5

Characters and Unicode

Total characters10991639
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowBROOKLYN
3rd rowBRONX
4th rowBROOKLYN
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
BROOKLYN 470551
22.0%
QUEENS 395650
18.5%
MANHATTAN 328674
15.4%
BRONX 218295
 
10.2%
STATEN ISLAND 61830
 
2.9%
(Missing) 664048
31.0%

Length

2024-12-04T13:00:53.411229image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-04T13:00:53.522228image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn 470551
30.6%
queens 395650
25.7%
manhattan 328674
21.4%
bronx 218295
14.2%
staten 61830
 
4.0%
island 61830
 
4.0%

Most occurring characters

ValueCountFrequency (%)
N 1865504
17.0%
O 1159397
10.5%
A 1109682
10.1%
E 853130
 
7.8%
T 781008
 
7.1%
R 688846
 
6.3%
B 688846
 
6.3%
L 532381
 
4.8%
S 519310
 
4.7%
Y 470551
 
4.3%
Other values (9) 2322984
21.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 10991639
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 1865504
17.0%
O 1159397
10.5%
A 1109682
10.1%
E 853130
 
7.8%
T 781008
 
7.1%
R 688846
 
6.3%
B 688846
 
6.3%
L 532381
 
4.8%
S 519310
 
4.7%
Y 470551
 
4.3%
Other values (9) 2322984
21.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 10991639
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 1865504
17.0%
O 1159397
10.5%
A 1109682
10.1%
E 853130
 
7.8%
T 781008
 
7.1%
R 688846
 
6.3%
B 688846
 
6.3%
L 532381
 
4.8%
S 519310
 
4.7%
Y 470551
 
4.3%
Other values (9) 2322984
21.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 10991639
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 1865504
17.0%
O 1159397
10.5%
A 1109682
10.1%
E 853130
 
7.8%
T 781008
 
7.1%
R 688846
 
6.3%
B 688846
 
6.3%
L 532381
 
4.8%
S 519310
 
4.7%
Y 470551
 
4.3%
Other values (9) 2322984
21.1%

ZIP CODE
Unsupported

Missing  Rejected  Unsupported 

Missing664310
Missing (%)31.1%
Memory size16.3 MiB

LATITUDE
Real number (ℝ)

Missing  Skewed 

Distinct127648
Distinct (%)6.7%
Missing239440
Missing (%)11.2%
Infinite0
Infinite (%)0.0%
Mean40.623741
Minimum0
Maximum43.344444
Zeros4677
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:00:53.650228image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.596511
Q140.66758
median40.720567
Q340.769623
95-th percentile40.86194
Maximum43.344444
Range43.344444
Interquartile range (IQR)0.102043

Descriptive statistics

Standard deviation2.0197808
Coefficient of variation (CV)0.049719222
Kurtosis399.91004
Mean40.623741
Median Absolute Deviation (MAD)0.0513168
Skewness-20.032027
Sum77169184
Variance4.0795145
MonotonicityNot monotonic
2024-12-04T13:00:53.771228image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4677
 
0.2%
40.861862 918
 
< 0.1%
40.696033 793
 
< 0.1%
40.8047 693
 
< 0.1%
40.608757 681
 
< 0.1%
40.798256 635
 
< 0.1%
40.759308 633
 
< 0.1%
40.6960346 587
 
< 0.1%
40.675735 585
 
< 0.1%
40.658577 544
 
< 0.1%
Other values (127638) 1888862
88.3%
(Missing) 239440
 
11.2%
ValueCountFrequency (%)
0 4677
0.2%
30.78418 1
 
< 0.1%
34.783634 1
 
< 0.1%
40.498947 1
 
< 0.1%
40.4989488 2
 
< 0.1%
40.4991346 1
 
< 0.1%
40.49931 1
 
< 0.1%
40.4994787 1
 
< 0.1%
40.499659 1
 
< 0.1%
40.499672 1
 
< 0.1%
ValueCountFrequency (%)
43.344444 1
 
< 0.1%
42.64154 1
 
< 0.1%
42.318317 1
 
< 0.1%
42.107204 1
 
< 0.1%
41.91661 1
 
< 0.1%
41.34796 1
 
< 0.1%
41.258785 1
 
< 0.1%
41.12615 5
< 0.1%
41.12421 1
 
< 0.1%
41.061634 2
 
< 0.1%

LONGITUDE
Real number (ℝ)

Missing 

Distinct99084
Distinct (%)5.2%
Missing239440
Missing (%)11.2%
Infinite0
Infinite (%)0.0%
Mean-73.7449
Minimum-201.35999
Maximum0
Zeros4677
Zeros (%)0.2%
Negative1894931
Negative (%)88.6%
Memory size16.3 MiB
2024-12-04T13:00:53.901231image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum-201.35999
5-th percentile-74.037093
Q1-73.9747
median-73.92709
Q3-73.866761
95-th percentile-73.76318
Maximum0
Range201.35999
Interquartile range (IQR)0.1079388

Descriptive statistics

Standard deviation3.7881604
Coefficient of variation (CV)-0.051368439
Kurtosis422.31976
Mean-73.7449
Median Absolute Deviation (MAD)0.0525804
Skewness16.047859
Sum-1.400864 × 108
Variance14.350159
MonotonicityNot monotonic
2024-12-04T13:00:54.045228image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4677
 
0.2%
-73.89063 790
 
< 0.1%
-73.91282 719
 
< 0.1%
-73.98453 708
 
< 0.1%
-73.89686 684
 
< 0.1%
-74.038086 682
 
< 0.1%
-73.91243 656
 
< 0.1%
-73.94476 618
 
< 0.1%
-73.9112 592
 
< 0.1%
-73.9845292 587
 
< 0.1%
Other values (99074) 1888895
88.3%
(Missing) 239440
 
11.2%
ValueCountFrequency (%)
-201.35999 1
 
< 0.1%
-201.23706 105
< 0.1%
-89.13527 1
 
< 0.1%
-86.76847 1
 
< 0.1%
-79.61955 1
 
< 0.1%
-79.00183 1
 
< 0.1%
-76.2634 1
 
< 0.1%
-76.02163 1
 
< 0.1%
-74.742 7
 
< 0.1%
-74.25496 1
 
< 0.1%
ValueCountFrequency (%)
0 4677
0.2%
-32.768513 16
 
< 0.1%
-47.209625 3
 
< 0.1%
-73.66301 1
 
< 0.1%
-73.70055 2
 
< 0.1%
-73.700584 11
 
< 0.1%
-73.7005968 10
 
< 0.1%
-73.70061 5
 
< 0.1%
-73.70071 4
 
< 0.1%
-73.70073 1
 
< 0.1%

LOCATION
Text

Missing 

Distinct297878
Distinct (%)15.7%
Missing239440
Missing (%)11.2%
Memory size16.3 MiB
2024-12-04T13:00:54.418254image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length25
Median length24
Mean length22.746237
Min length10

Characters and Unicode

Total characters43208933
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique166390 ?
Unique (%)8.8%

Sample

1st row(40.667202, -73.8665)
2nd row(40.683304, -73.917274)
3rd row(40.709183, -73.956825)
4th row(40.86816, -73.83148)
5th row(40.67172, -73.8971)
ValueCountFrequency (%)
0.0 9354
 
0.2%
40.861862 918
 
< 0.1%
40.696033 793
 
< 0.1%
73.89063 790
 
< 0.1%
73.91282 719
 
< 0.1%
73.98453 708
 
< 0.1%
40.8047 693
 
< 0.1%
73.89686 684
 
< 0.1%
74.038086 682
 
< 0.1%
40.608757 681
 
< 0.1%
Other values (226721) 3783194
99.6%
2024-12-04T13:00:55.001233image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7 4730794
10.9%
4 4101049
 
9.5%
. 3799216
 
8.8%
3 3601441
 
8.3%
0 3502505
 
8.1%
9 2775395
 
6.4%
8 2725726
 
6.3%
6 2694721
 
6.2%
5 2156181
 
5.0%
) 1899608
 
4.4%
Other values (6) 11222297
26.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 43208933
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
7 4730794
10.9%
4 4101049
 
9.5%
. 3799216
 
8.8%
3 3601441
 
8.3%
0 3502505
 
8.1%
9 2775395
 
6.4%
8 2725726
 
6.3%
6 2694721
 
6.2%
5 2156181
 
5.0%
) 1899608
 
4.4%
Other values (6) 11222297
26.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 43208933
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
7 4730794
10.9%
4 4101049
 
9.5%
. 3799216
 
8.8%
3 3601441
 
8.3%
0 3502505
 
8.1%
9 2775395
 
6.4%
8 2725726
 
6.3%
6 2694721
 
6.2%
5 2156181
 
5.0%
) 1899608
 
4.4%
Other values (6) 11222297
26.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 43208933
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
7 4730794
10.9%
4 4101049
 
9.5%
. 3799216
 
8.8%
3 3601441
 
8.3%
0 3502505
 
8.1%
9 2775395
 
6.4%
8 2725726
 
6.3%
6 2694721
 
6.2%
5 2156181
 
5.0%
) 1899608
 
4.4%
Other values (6) 11222297
26.0%

ON STREET NAME
Text

Missing 

Distinct20297
Distinct (%)1.2%
Missing458746
Missing (%)21.4%
Memory size16.3 MiB
2024-12-04T13:00:55.386230image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length32
Median length32
Mean length29.208227
Min length2

Characters and Unicode

Total characters49078643
Distinct characters75
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7380 ?
Unique (%)0.4%

Sample

1st rowWHITESTONE EXPRESSWAY
2nd rowQUEENSBORO BRIDGE UPPER
3rd rowTHROGS NECK BRIDGE
4th rowSARATOGA AVENUE
5th rowMAJOR DEEGAN EXPRESSWAY RAMP
ValueCountFrequency (%)
avenue 622140
 
16.0%
street 532426
 
13.7%
east 156743
 
4.0%
boulevard 129739
 
3.3%
west 117202
 
3.0%
parkway 77408
 
2.0%
road 69620
 
1.8%
expressway 66041
 
1.7%
island 31656
 
0.8%
queens 28007
 
0.7%
Other values (5424) 2046868
52.8%
2024-12-04T13:00:55.910233image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
27622941
56.3%
E 3767585
 
7.7%
A 2005980
 
4.1%
T 1877599
 
3.8%
R 1716542
 
3.5%
N 1466857
 
3.0%
S 1447403
 
2.9%
U 1001507
 
2.0%
O 893179
 
1.8%
V 875073
 
1.8%
Other values (65) 6403977
 
13.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 49078643
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
27622941
56.3%
E 3767585
 
7.7%
A 2005980
 
4.1%
T 1877599
 
3.8%
R 1716542
 
3.5%
N 1466857
 
3.0%
S 1447403
 
2.9%
U 1001507
 
2.0%
O 893179
 
1.8%
V 875073
 
1.8%
Other values (65) 6403977
 
13.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 49078643
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
27622941
56.3%
E 3767585
 
7.7%
A 2005980
 
4.1%
T 1877599
 
3.8%
R 1716542
 
3.5%
N 1466857
 
3.0%
S 1447403
 
2.9%
U 1001507
 
2.0%
O 893179
 
1.8%
V 875073
 
1.8%
Other values (65) 6403977
 
13.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 49078643
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
27622941
56.3%
E 3767585
 
7.7%
A 2005980
 
4.1%
T 1877599
 
3.8%
R 1716542
 
3.5%
N 1466857
 
3.0%
S 1447403
 
2.9%
U 1001507
 
2.0%
O 893179
 
1.8%
V 875073
 
1.8%
Other values (65) 6403977
 
13.0%

CROSS STREET NAME
Text

Missing 

Distinct22031
Distinct (%)1.7%
Missing815476
Missing (%)38.1%
Memory size16.3 MiB
2024-12-04T13:00:56.306260image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length32
Median length31
Mean length22.458799
Min length1

Characters and Unicode

Total characters29725837
Distinct characters76
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7001 ?
Unique (%)0.5%

Sample

1st row20 AVENUE
2nd rowDECATUR STREET
3rd rowEAST 43 STREET
4th rowEAST GATE PLAZA
5th rowwest 80 street -west 81 street
ValueCountFrequency (%)
avenue 577527
 
19.7%
street 468587
 
16.0%
east 114391
 
3.9%
west 72235
 
2.5%
boulevard 70367
 
2.4%
road 56758
 
1.9%
place 34621
 
1.2%
parkway 27321
 
0.9%
3 19241
 
0.7%
park 17806
 
0.6%
Other values (5526) 1468653
50.2%
2024-12-04T13:00:56.854231image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14155131
47.6%
E 3004574
 
10.1%
T 1486246
 
5.0%
A 1455639
 
4.9%
R 1174676
 
4.0%
N 1101186
 
3.7%
S 1012508
 
3.4%
U 795008
 
2.7%
V 727329
 
2.4%
O 593812
 
2.0%
Other values (66) 4219728
 
14.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 29725837
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
14155131
47.6%
E 3004574
 
10.1%
T 1486246
 
5.0%
A 1455639
 
4.9%
R 1174676
 
4.0%
N 1101186
 
3.7%
S 1012508
 
3.4%
U 795008
 
2.7%
V 727329
 
2.4%
O 593812
 
2.0%
Other values (66) 4219728
 
14.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 29725837
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
14155131
47.6%
E 3004574
 
10.1%
T 1486246
 
5.0%
A 1455639
 
4.9%
R 1174676
 
4.0%
N 1101186
 
3.7%
S 1012508
 
3.4%
U 795008
 
2.7%
V 727329
 
2.4%
O 593812
 
2.0%
Other values (66) 4219728
 
14.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 29725837
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
14155131
47.6%
E 3004574
 
10.1%
T 1486246
 
5.0%
A 1455639
 
4.9%
R 1174676
 
4.0%
N 1101186
 
3.7%
S 1012508
 
3.4%
U 795008
 
2.7%
V 727329
 
2.4%
O 593812
 
2.0%
Other values (66) 4219728
 
14.2%

OFF STREET NAME
Text

Missing 

Distinct238084
Distinct (%)65.0%
Missing1772675
Missing (%)82.9%
Memory size16.3 MiB
2024-12-04T13:00:57.244230image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length40
Median length40
Mean length35.365745
Min length8

Characters and Unicode

Total characters12957054
Distinct characters84
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique185679 ?
Unique (%)50.7%

Sample

1st row1211 LORING AVENUE
2nd row344 BAYCHESTER AVENUE
3rd row2047 PITKIN AVENUE
4th row480 DEAN STREET
5th row878 FLATBUSH AVENUE
ValueCountFrequency (%)
avenue 144341
 
11.9%
street 132139
 
10.9%
east 34831
 
2.9%
west 25197
 
2.1%
boulevard 22997
 
1.9%
road 17136
 
1.4%
lot 7881
 
0.6%
parking 7267
 
0.6%
parkway 7265
 
0.6%
place 7123
 
0.6%
Other values (27876) 811117
66.6%
2024-12-04T13:00:57.935233image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7002858
54.0%
E 836211
 
6.5%
T 458410
 
3.5%
A 428269
 
3.3%
R 355701
 
2.7%
N 312373
 
2.4%
S 300946
 
2.3%
1 291801
 
2.3%
U 212281
 
1.6%
V 198958
 
1.5%
Other values (74) 2559246
 
19.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12957054
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
7002858
54.0%
E 836211
 
6.5%
T 458410
 
3.5%
A 428269
 
3.3%
R 355701
 
2.7%
N 312373
 
2.4%
S 300946
 
2.3%
1 291801
 
2.3%
U 212281
 
1.6%
V 198958
 
1.5%
Other values (74) 2559246
 
19.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12957054
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
7002858
54.0%
E 836211
 
6.5%
T 458410
 
3.5%
A 428269
 
3.3%
R 355701
 
2.7%
N 312373
 
2.4%
S 300946
 
2.3%
1 291801
 
2.3%
U 212281
 
1.6%
V 198958
 
1.5%
Other values (74) 2559246
 
19.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12957054
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
7002858
54.0%
E 836211
 
6.5%
T 458410
 
3.5%
A 428269
 
3.3%
R 355701
 
2.7%
N 312373
 
2.4%
S 300946
 
2.3%
1 291801
 
2.3%
U 212281
 
1.6%
V 198958
 
1.5%
Other values (74) 2559246
 
19.8%

NUMBER OF PERSONS INJURED
Real number (ℝ)

High correlation  Zeros 

Distinct32
Distinct (%)< 0.1%
Missing18
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.31853971
Minimum0
Maximum43
Zeros1636505
Zeros (%)76.5%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:00:58.071244image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.70733463
Coefficient of variation (CV)2.220554
Kurtosis48.770161
Mean0.31853971
Median Absolute Deviation (MAD)0
Skewness4.1681048
Sum681366
Variance0.50032228
MonotonicityNot monotonic
2024-12-04T13:00:58.212232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
0 1636505
76.5%
1 389983
 
18.2%
2 73462
 
3.4%
3 24056
 
1.1%
4 8913
 
0.4%
5 3413
 
0.2%
6 1430
 
0.1%
7 599
 
< 0.1%
8 267
 
< 0.1%
9 135
 
< 0.1%
Other values (22) 267
 
< 0.1%
ValueCountFrequency (%)
0 1636505
76.5%
1 389983
 
18.2%
2 73462
 
3.4%
3 24056
 
1.1%
4 8913
 
0.4%
5 3413
 
0.2%
6 1430
 
0.1%
7 599
 
< 0.1%
8 267
 
< 0.1%
9 135
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
34 1
 
< 0.1%
32 1
 
< 0.1%
31 1
 
< 0.1%
27 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 3
< 0.1%

NUMBER OF PERSONS KILLED
Real number (ℝ)

High correlation  Skewed  Zeros 

Distinct7
Distinct (%)< 0.1%
Missing31
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.0015404272
Minimum0
Maximum8
Zeros2135856
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:00:58.314252image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.041432376
Coefficient of variation (CV)26.896679
Kurtosis1852.8104
Mean0.0015404272
Median Absolute Deviation (MAD)0
Skewness33.180902
Sum3295
Variance0.0017166418
MonotonicityNot monotonic
2024-12-04T13:00:58.415231image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 2135856
99.9%
1 3059
 
0.1%
2 83
 
< 0.1%
3 12
 
< 0.1%
4 4
 
< 0.1%
5 2
 
< 0.1%
8 1
 
< 0.1%
(Missing) 31
 
< 0.1%
ValueCountFrequency (%)
0 2135856
99.9%
1 3059
 
0.1%
2 83
 
< 0.1%
3 12
 
< 0.1%
4 4
 
< 0.1%
5 2
 
< 0.1%
8 1
 
< 0.1%
ValueCountFrequency (%)
8 1
 
< 0.1%
5 2
 
< 0.1%
4 4
 
< 0.1%
3 12
 
< 0.1%
2 83
 
< 0.1%
1 3059
 
0.1%
0 2135856
99.9%

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ)

Zeros 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.057821517
Minimum0
Maximum27
Zeros2020475
Zeros (%)94.5%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:00:58.549232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum27
Range27
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2465722
Coefficient of variation (CV)4.2643676
Kurtosis121.50453
Mean0.057821517
Median Absolute Deviation (MAD)0
Skewness5.5674996
Sum123683
Variance0.060797851
MonotonicityNot monotonic
2024-12-04T13:00:58.662232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 2020475
94.5%
1 114212
 
5.3%
2 3865
 
0.2%
3 383
 
< 0.1%
4 62
 
< 0.1%
5 27
 
< 0.1%
6 11
 
< 0.1%
7 5
 
< 0.1%
9 2
 
< 0.1%
8 2
 
< 0.1%
Other values (4) 4
 
< 0.1%
ValueCountFrequency (%)
0 2020475
94.5%
1 114212
 
5.3%
2 3865
 
0.2%
3 383
 
< 0.1%
4 62
 
< 0.1%
5 27
 
< 0.1%
6 11
 
< 0.1%
7 5
 
< 0.1%
8 2
 
< 0.1%
9 2
 
< 0.1%
ValueCountFrequency (%)
27 1
 
< 0.1%
19 1
 
< 0.1%
15 1
 
< 0.1%
13 1
 
< 0.1%
9 2
 
< 0.1%
8 2
 
< 0.1%
7 5
 
< 0.1%
6 11
 
< 0.1%
5 27
< 0.1%
4 62
< 0.1%

NUMBER OF PEDESTRIANS KILLED
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.3 MiB
0
2137443 
1
 
1590
2
 
13
6
 
1
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2139048
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2137443
99.9%
1 1590
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Length

2024-12-04T13:00:58.771232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-04T13:00:58.891229image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
ValueCountFrequency (%)
0 2137443
99.9%
1 1590
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2137443
99.9%
1 1590
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2137443
99.9%
1 1590
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2137443
99.9%
1 1590
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2137443
99.9%
1 1590
 
0.1%
2 13
 
< 0.1%
6 1
 
< 0.1%
4 1
 
< 0.1%

NUMBER OF CYCLIST INJURED
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.3 MiB
0
2080100 
1
 
58250
2
 
673
3
 
24
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2139048
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2080100
97.2%
1 58250
 
2.7%
2 673
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Length

2024-12-04T13:00:59.017229image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-04T13:00:59.133229image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
ValueCountFrequency (%)
0 2080100
97.2%
1 58250
 
2.7%
2 673
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2080100
97.2%
1 58250
 
2.7%
2 673
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2080100
97.2%
1 58250
 
2.7%
2 673
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2080100
97.2%
1 58250
 
2.7%
2 673
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2080100
97.2%
1 58250
 
2.7%
2 673
 
< 0.1%
3 24
 
< 0.1%
4 1
 
< 0.1%

NUMBER OF CYCLIST KILLED
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.3 MiB
0
2138791 
1
 
256
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2139048
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 2138791
> 99.9%
1 256
 
< 0.1%
2 1
 
< 0.1%

Length

2024-12-04T13:00:59.258232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-12-04T13:00:59.355229image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
ValueCountFrequency (%)
0 2138791
> 99.9%
1 256
 
< 0.1%
2 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0 2138791
> 99.9%
1 256
 
< 0.1%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 2138791
> 99.9%
1 256
 
< 0.1%
2 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 2138791
> 99.9%
1 256
 
< 0.1%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2139048
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 2138791
> 99.9%
1 256
 
< 0.1%
2 1
 
< 0.1%

NUMBER OF MOTORIST INJURED
Real number (ℝ)

High correlation  Zeros 

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22864143
Minimum0
Maximum43
Zeros1819309
Zeros (%)85.1%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:00:59.461232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum43
Range43
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.66852401
Coefficient of variation (CV)2.923897
Kurtosis60.610294
Mean0.22864143
Median Absolute Deviation (MAD)0
Skewness5.0249595
Sum489075
Variance0.44692435
MonotonicityNot monotonic
2024-12-04T13:00:59.584232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 1819309
85.1%
1 214875
 
10.0%
2 66859
 
3.1%
3 23319
 
1.1%
4 8727
 
0.4%
5 3360
 
0.2%
6 1382
 
0.1%
7 573
 
< 0.1%
8 258
 
< 0.1%
9 130
 
< 0.1%
Other values (21) 256
 
< 0.1%
ValueCountFrequency (%)
0 1819309
85.1%
1 214875
 
10.0%
2 66859
 
3.1%
3 23319
 
1.1%
4 8727
 
0.4%
5 3360
 
0.2%
6 1382
 
0.1%
7 573
 
< 0.1%
8 258
 
< 0.1%
9 130
 
< 0.1%
ValueCountFrequency (%)
43 1
 
< 0.1%
40 1
 
< 0.1%
34 1
 
< 0.1%
31 1
 
< 0.1%
30 1
 
< 0.1%
25 1
 
< 0.1%
24 3
< 0.1%
23 1
 
< 0.1%
22 2
< 0.1%
21 1
 
< 0.1%

NUMBER OF MOTORIST KILLED
Real number (ℝ)

High correlation  Skewed  Zeros 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00063532936
Minimum0
Maximum5
Zeros2137793
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:00:59.717233image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.027572005
Coefficient of variation (CV)43.397971
Kurtosis4006.7511
Mean0.00063532936
Median Absolute Deviation (MAD)0
Skewness53.526419
Sum1359
Variance0.00076021545
MonotonicityNot monotonic
2024-12-04T13:00:59.838232image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 2137793
99.9%
1 1173
 
0.1%
2 66
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
0 2137793
99.9%
1 1173
 
0.1%
2 66
 
< 0.1%
3 12
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
5 2
 
< 0.1%
4 2
 
< 0.1%
3 12
 
< 0.1%
2 66
 
< 0.1%
1 1173
 
0.1%
0 2137793
99.9%
Distinct61
Distinct (%)< 0.1%
Missing7247
Missing (%)0.3%
Memory size16.3 MiB
2024-12-04T13:01:00.155293image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length53
Median length43
Mean length19.558659
Min length1

Characters and Unicode

Total characters41695169
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAggressive Driving/Road Rage
2nd rowPavement Slippery
3rd rowFollowing Too Closely
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 722522
17.0%
driver 464742
 
10.9%
inattention/distraction 430808
 
10.1%
closely 168527
 
4.0%
too 168527
 
4.0%
to 153071
 
3.6%
failure 133931
 
3.1%
yield 127534
 
3.0%
right-of-way 127534
 
3.0%
following 114773
 
2.7%
Other values (96) 1648193
38.7%
2024-12-04T13:01:00.572296image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 4683319
 
11.2%
e 4243184
 
10.2%
n 3625879
 
8.7%
t 2898877
 
7.0%
o 2465185
 
5.9%
r 2457408
 
5.9%
s 2162592
 
5.2%
2128361
 
5.1%
a 2061170
 
4.9%
c 1600117
 
3.8%
Other values (45) 13369077
32.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 41695169
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 4683319
 
11.2%
e 4243184
 
10.2%
n 3625879
 
8.7%
t 2898877
 
7.0%
o 2465185
 
5.9%
r 2457408
 
5.9%
s 2162592
 
5.2%
2128361
 
5.1%
a 2061170
 
4.9%
c 1600117
 
3.8%
Other values (45) 13369077
32.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 41695169
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 4683319
 
11.2%
e 4243184
 
10.2%
n 3625879
 
8.7%
t 2898877
 
7.0%
o 2465185
 
5.9%
r 2457408
 
5.9%
s 2162592
 
5.2%
2128361
 
5.1%
a 2061170
 
4.9%
c 1600117
 
3.8%
Other values (45) 13369077
32.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 41695169
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 4683319
 
11.2%
e 4243184
 
10.2%
n 3625879
 
8.7%
t 2898877
 
7.0%
o 2465185
 
5.9%
r 2457408
 
5.9%
s 2162592
 
5.2%
2128361
 
5.1%
a 2061170
 
4.9%
c 1600117
 
3.8%
Other values (45) 13369077
32.1%
Distinct61
Distinct (%)< 0.1%
Missing336447
Missing (%)15.7%
Memory size16.3 MiB
2024-12-04T13:01:00.781295image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.05409
Min length1

Characters and Unicode

Total characters23531315
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 1517571
68.6%
driver 103988
 
4.7%
inattention/distraction 97069
 
4.4%
other 33942
 
1.5%
vehicular 32876
 
1.5%
too 28805
 
1.3%
closely 28805
 
1.3%
passing 22300
 
1.0%
to 22054
 
1.0%
lane 20754
 
0.9%
Other values (96) 304097
 
13.7%
2024-12-04T13:01:01.120293image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 3708448
15.8%
e 3610400
15.3%
n 2109289
9.0%
s 1807319
7.7%
c 1712694
7.3%
d 1593491
6.8%
p 1589343
6.8%
f 1575663
6.7%
U 1555327
6.6%
t 637147
 
2.7%
Other values (45) 3632194
15.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 23531315
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 3708448
15.8%
e 3610400
15.3%
n 2109289
9.0%
s 1807319
7.7%
c 1712694
7.3%
d 1593491
6.8%
p 1589343
6.8%
f 1575663
6.7%
U 1555327
6.6%
t 637147
 
2.7%
Other values (45) 3632194
15.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 23531315
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 3708448
15.8%
e 3610400
15.3%
n 2109289
9.0%
s 1807319
7.7%
c 1712694
7.3%
d 1593491
6.8%
p 1589343
6.8%
f 1575663
6.7%
U 1555327
6.6%
t 637147
 
2.7%
Other values (45) 3632194
15.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 23531315
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 3708448
15.8%
e 3610400
15.3%
n 2109289
9.0%
s 1807319
7.7%
c 1712694
7.3%
d 1593491
6.8%
p 1589343
6.8%
f 1575663
6.7%
U 1555327
6.6%
t 637147
 
2.7%
Other values (45) 3632194
15.4%
Distinct52
Distinct (%)< 0.1%
Missing1985155
Missing (%)92.8%
Memory size16.3 MiB
2024-12-04T13:01:01.329293image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length53
Median length11
Mean length11.658977
Min length1

Characters and Unicode

Total characters1794235
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 143435
85.8%
other 2964
 
1.8%
vehicular 2924
 
1.7%
driver 2229
 
1.3%
closely 2093
 
1.3%
too 2093
 
1.3%
inattention/distraction 2040
 
1.2%
following 2036
 
1.2%
fatigued/drowsy 853
 
0.5%
pavement 418
 
0.2%
Other values (80) 6145
 
3.7%
2024-12-04T13:01:01.698296image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 306610
17.1%
i 305180
17.0%
n 157369
8.8%
s 150660
8.4%
c 150109
8.4%
d 145596
8.1%
p 145157
8.1%
f 144378
8.0%
U 144141
8.0%
o 17939
 
1.0%
Other values (45) 127096
7.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1794235
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 306610
17.1%
i 305180
17.0%
n 157369
8.8%
s 150660
8.4%
c 150109
8.4%
d 145596
8.1%
p 145157
8.1%
f 144378
8.0%
U 144141
8.0%
o 17939
 
1.0%
Other values (45) 127096
7.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1794235
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 306610
17.1%
i 305180
17.0%
n 157369
8.8%
s 150660
8.4%
c 150109
8.4%
d 145596
8.1%
p 145157
8.1%
f 144378
8.0%
U 144141
8.0%
o 17939
 
1.0%
Other values (45) 127096
7.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1794235
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 306610
17.1%
i 305180
17.0%
n 157369
8.8%
s 150660
8.4%
c 150109
8.4%
d 145596
8.1%
p 145157
8.1%
f 144378
8.0%
U 144141
8.0%
o 17939
 
1.0%
Other values (45) 127096
7.1%

CONTRIBUTING FACTOR VEHICLE 4
Categorical

High correlation  Imbalance  Missing 

Distinct42
Distinct (%)0.1%
Missing2104055
Missing (%)98.4%
Memory size16.3 MiB
Unspecified
33009 
Other Vehicular
 
654
Following Too Closely
 
403
Driver Inattention/Distraction
 
289
Fatigued/Drowsy
 
170
Other values (37)
 
468

Length

Max length43
Median length11
Mean length11.489927
Min length5

Characters and Unicode

Total characters402067
Distinct characters51
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 33009
 
1.5%
Other Vehicular 654
 
< 0.1%
Following Too Closely 403
 
< 0.1%
Driver Inattention/Distraction 289
 
< 0.1%
Fatigued/Drowsy 170
 
< 0.1%
Pavement Slippery 120
 
< 0.1%
Reaction to Uninvolved Vehicle 43
 
< 0.1%
Unsafe Speed 34
 
< 0.1%
Outside Car Distraction 31
 
< 0.1%
Driver Inexperience 30
 
< 0.1%
Other values (32) 210
 
< 0.1%
(Missing) 2104055
98.4%

Length

2024-12-04T13:01:01.833304image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 33009
88.1%
other 663
 
1.8%
vehicular 654
 
1.7%
too 408
 
1.1%
closely 408
 
1.1%
following 403
 
1.1%
driver 319
 
0.9%
inattention/distraction 289
 
0.8%
fatigued/drowsy 170
 
0.5%
pavement 123
 
0.3%
Other values (65) 1009
 
2.7%

Most occurring characters

ValueCountFrequency (%)
e 69768
17.4%
i 69115
17.2%
n 35182
8.8%
c 34238
8.5%
s 34200
8.5%
p 33386
8.3%
d 33372
8.3%
f 33139
8.2%
U 33121
8.2%
o 3186
 
0.8%
Other values (41) 23360
 
5.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 402067
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 69768
17.4%
i 69115
17.2%
n 35182
8.8%
c 34238
8.5%
s 34200
8.5%
p 33386
8.3%
d 33372
8.3%
f 33139
8.2%
U 33121
8.2%
o 3186
 
0.8%
Other values (41) 23360
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 402067
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 69768
17.4%
i 69115
17.2%
n 35182
8.8%
c 34238
8.5%
s 34200
8.5%
p 33386
8.3%
d 33372
8.3%
f 33139
8.2%
U 33121
8.2%
o 3186
 
0.8%
Other values (41) 23360
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 402067
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 69768
17.4%
i 69115
17.2%
n 35182
8.8%
c 34238
8.5%
s 34200
8.5%
p 33386
8.3%
d 33372
8.3%
f 33139
8.2%
U 33121
8.2%
o 3186
 
0.8%
Other values (41) 23360
 
5.8%

CONTRIBUTING FACTOR VEHICLE 5
Categorical

High correlation  Imbalance  Missing 

Distinct31
Distinct (%)0.3%
Missing2129500
Missing (%)99.6%
Memory size16.3 MiB
Unspecified
9001 
Other Vehicular
 
193
Following Too Closely
 
104
Driver Inattention/Distraction
 
67
Pavement Slippery
 
50
Other values (26)
 
133

Length

Max length43
Median length11
Mean length11.467114
Min length5

Characters and Unicode

Total characters109488
Distinct characters50
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 9001
 
0.4%
Other Vehicular 193
 
< 0.1%
Following Too Closely 104
 
< 0.1%
Driver Inattention/Distraction 67
 
< 0.1%
Pavement Slippery 50
 
< 0.1%
Fatigued/Drowsy 41
 
< 0.1%
Reaction to Uninvolved Vehicle 12
 
< 0.1%
Alcohol Involvement 11
 
< 0.1%
Obstruction/Debris 10
 
< 0.1%
Driver Inexperience 10
 
< 0.1%
Other values (21) 49
 
< 0.1%
(Missing) 2129500
99.6%

Length

2024-12-04T13:01:01.968297image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unspecified 9001
88.2%
other 195
 
1.9%
vehicular 193
 
1.9%
too 106
 
1.0%
closely 106
 
1.0%
following 104
 
1.0%
driver 77
 
0.8%
inattention/distraction 67
 
0.7%
pavement 51
 
0.5%
slippery 50
 
0.5%
Other values (48) 256
 
2.5%

Most occurring characters

ValueCountFrequency (%)
e 19065
17.4%
i 18811
17.2%
n 9549
8.7%
c 9340
8.5%
s 9285
8.5%
p 9129
8.3%
d 9087
8.3%
f 9028
8.2%
U 9024
8.2%
o 818
 
0.7%
Other values (40) 6352
 
5.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 109488
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 19065
17.4%
i 18811
17.2%
n 9549
8.7%
c 9340
8.5%
s 9285
8.5%
p 9129
8.3%
d 9087
8.3%
f 9028
8.2%
U 9024
8.2%
o 818
 
0.7%
Other values (40) 6352
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 109488
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 19065
17.4%
i 18811
17.2%
n 9549
8.7%
c 9340
8.5%
s 9285
8.5%
p 9129
8.3%
d 9087
8.3%
f 9028
8.2%
U 9024
8.2%
o 818
 
0.7%
Other values (40) 6352
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 109488
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 19065
17.4%
i 18811
17.2%
n 9549
8.7%
c 9340
8.5%
s 9285
8.5%
p 9129
8.3%
d 9087
8.3%
f 9028
8.2%
U 9024
8.2%
o 818
 
0.7%
Other values (40) 6352
 
5.8%

COLLISION_ID
Real number (ℝ)

Unique 

Distinct2139048
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3206744.8
Minimum22
Maximum4775840
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size16.3 MiB
2024-12-04T13:01:02.098294image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile107815.35
Q13170881.8
median3705786.5
Q34240781.2
95-th percentile4668661.7
Maximum4775840
Range4775818
Interquartile range (IQR)1069899.5

Descriptive statistics

Standard deviation1506827.2
Coefficient of variation (CV)0.46989307
Kurtosis0.052063161
Mean3206744.8
Median Absolute Deviation (MAD)534950
Skewness-1.2425404
Sum6.859381 × 1012
Variance2.2705281 × 1012
MonotonicityNot monotonic
2024-12-04T13:01:02.233292image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4775537 1
 
< 0.1%
4455765 1
 
< 0.1%
4513547 1
 
< 0.1%
4541903 1
 
< 0.1%
4456314 1
 
< 0.1%
4486609 1
 
< 0.1%
4407458 1
 
< 0.1%
4486555 1
 
< 0.1%
4775649 1
 
< 0.1%
4775076 1
 
< 0.1%
Other values (2139038) 2139038
> 99.9%
ValueCountFrequency (%)
22 1
< 0.1%
23 1
< 0.1%
24 1
< 0.1%
25 1
< 0.1%
26 1
< 0.1%
27 1
< 0.1%
28 1
< 0.1%
29 1
< 0.1%
30 1
< 0.1%
31 1
< 0.1%
ValueCountFrequency (%)
4775840 1
< 0.1%
4775835 1
< 0.1%
4775832 1
< 0.1%
4775820 1
< 0.1%
4775817 1
< 0.1%
4775815 1
< 0.1%
4775810 1
< 0.1%
4775809 1
< 0.1%
4775807 1
< 0.1%
4775801 1
< 0.1%
Distinct1740
Distinct (%)0.1%
Missing14731
Missing (%)0.7%
Memory size16.3 MiB
2024-12-04T13:01:02.442296image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length38
Median length35
Mean length16.858735
Min length1

Characters and Unicode

Total characters35813298
Distinct characters77
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1057 ?
Unique (%)< 0.1%

Sample

1st rowSedan
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowDump
ValueCountFrequency (%)
vehicle 902084
18.0%
utility 655616
13.1%
station 655572
13.1%
sedan 647727
12.9%
wagon/sport 475280
9.5%
passenger 416223
8.3%
181733
 
3.6%
wagon 180357
 
3.6%
sport 180291
 
3.6%
truck 89093
 
1.8%
Other values (1003) 630045
12.6%
2024-12-04T13:01:02.775294image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2902924
 
8.1%
S 2807922
 
7.8%
t 2412208
 
6.7%
i 2031525
 
5.7%
E 1820010
 
5.1%
a 1696201
 
4.7%
e 1689518
 
4.7%
n 1621382
 
4.5%
o 1507025
 
4.2%
T 1147439
 
3.2%
Other values (67) 16177144
45.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 35813298
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2902924
 
8.1%
S 2807922
 
7.8%
t 2412208
 
6.7%
i 2031525
 
5.7%
E 1820010
 
5.1%
a 1696201
 
4.7%
e 1689518
 
4.7%
n 1621382
 
4.5%
o 1507025
 
4.2%
T 1147439
 
3.2%
Other values (67) 16177144
45.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 35813298
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2902924
 
8.1%
S 2807922
 
7.8%
t 2412208
 
6.7%
i 2031525
 
5.7%
E 1820010
 
5.1%
a 1696201
 
4.7%
e 1689518
 
4.7%
n 1621382
 
4.5%
o 1507025
 
4.2%
T 1147439
 
3.2%
Other values (67) 16177144
45.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 35813298
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2902924
 
8.1%
S 2807922
 
7.8%
t 2412208
 
6.7%
i 2031525
 
5.7%
E 1820010
 
5.1%
a 1696201
 
4.7%
e 1689518
 
4.7%
n 1621382
 
4.5%
o 1507025
 
4.2%
T 1147439
 
3.2%
Other values (67) 16177144
45.2%

VEHICLE TYPE CODE 2
Text

Missing 

Distinct1927
Distinct (%)0.1%
Missing417714
Missing (%)19.5%
Memory size16.3 MiB
2024-12-04T13:01:02.947295image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length38
Median length30
Mean length16.048236
Min length1

Characters and Unicode

Total characters27624375
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1144 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowPick-up Truck
3rd rowSedan
4th rowTractor Truck Diesel
5th rowSedan
ValueCountFrequency (%)
vehicle 666333
17.0%
utility 479362
12.2%
station 479331
12.2%
sedan 452060
11.5%
wagon/sport 339127
8.7%
passenger 318613
8.1%
141595
 
3.6%
wagon 140261
 
3.6%
sport 140204
 
3.6%
truck 88557
 
2.3%
Other values (1051) 670787
17.1%
2024-12-04T13:01:03.414297image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2207864
 
8.0%
S 2073695
 
7.5%
t 1731570
 
6.3%
i 1488595
 
5.4%
E 1440409
 
5.2%
e 1240318
 
4.5%
a 1210326
 
4.4%
n 1150132
 
4.2%
o 1104650
 
4.0%
T 924311
 
3.3%
Other values (63) 13052505
47.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 27624375
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2207864
 
8.0%
S 2073695
 
7.5%
t 1731570
 
6.3%
i 1488595
 
5.4%
E 1440409
 
5.2%
e 1240318
 
4.5%
a 1210326
 
4.4%
n 1150132
 
4.2%
o 1104650
 
4.0%
T 924311
 
3.3%
Other values (63) 13052505
47.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 27624375
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2207864
 
8.0%
S 2073695
 
7.5%
t 1731570
 
6.3%
i 1488595
 
5.4%
E 1440409
 
5.2%
e 1240318
 
4.5%
a 1210326
 
4.4%
n 1150132
 
4.2%
o 1104650
 
4.0%
T 924311
 
3.3%
Other values (63) 13052505
47.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 27624375
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2207864
 
8.0%
S 2073695
 
7.5%
t 1731570
 
6.3%
i 1488595
 
5.4%
E 1440409
 
5.2%
e 1240318
 
4.5%
a 1210326
 
4.4%
n 1150132
 
4.2%
o 1104650
 
4.0%
T 924311
 
3.3%
Other values (63) 13052505
47.2%

VEHICLE TYPE CODE 3
Text

Missing 

Distinct276
Distinct (%)0.2%
Missing1990930
Missing (%)93.1%
Memory size16.3 MiB
2024-12-04T13:01:03.597294image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length17.666644
Min length2

Characters and Unicode

Total characters2616748
Distinct characters62
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique163 ?
Unique (%)0.1%

Sample

1st rowSedan
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 66321
18.5%
utility 51533
14.4%
station 51530
14.4%
sedan 49693
13.8%
wagon/sport 38171
10.6%
passenger 27716
7.7%
13443
 
3.7%
wagon 13359
 
3.7%
sport 13358
 
3.7%
truck 4583
 
1.3%
Other values (225) 29153
8.1%
2024-12-04T13:01:03.912297image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
211177
 
8.1%
S 207278
 
7.9%
t 192358
 
7.4%
i 158893
 
6.1%
a 129793
 
5.0%
e 129373
 
4.9%
n 126987
 
4.9%
o 117714
 
4.5%
E 116431
 
4.4%
l 77821
 
3.0%
Other values (52) 1148923
43.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2616748
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
211177
 
8.1%
S 207278
 
7.9%
t 192358
 
7.4%
i 158893
 
6.1%
a 129793
 
5.0%
e 129373
 
4.9%
n 126987
 
4.9%
o 117714
 
4.5%
E 116431
 
4.4%
l 77821
 
3.0%
Other values (52) 1148923
43.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2616748
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
211177
 
8.1%
S 207278
 
7.9%
t 192358
 
7.4%
i 158893
 
6.1%
a 129793
 
5.0%
e 129373
 
4.9%
n 126987
 
4.9%
o 117714
 
4.5%
E 116431
 
4.4%
l 77821
 
3.0%
Other values (52) 1148923
43.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2616748
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
211177
 
8.1%
S 207278
 
7.9%
t 192358
 
7.4%
i 158893
 
6.1%
a 129793
 
5.0%
e 129373
 
4.9%
n 126987
 
4.9%
o 117714
 
4.5%
E 116431
 
4.4%
l 77821
 
3.0%
Other values (52) 1148923
43.9%

VEHICLE TYPE CODE 4
Text

Missing 

Distinct108
Distinct (%)0.3%
Missing2105307
Missing (%)98.4%
Memory size16.3 MiB
2024-12-04T13:01:04.079296image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.007587
Min length2

Characters and Unicode

Total characters607594
Distinct characters58
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)0.2%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowSedan
3rd rowStation Wagon/Sport Utility Vehicle
4th rowSedan
5th rowSedan
ValueCountFrequency (%)
vehicle 15533
18.9%
utility 12359
15.0%
station 12359
15.0%
sedan 12083
14.7%
wagon/sport 9507
11.5%
passenger 5970
 
7.2%
2860
 
3.5%
sport 2852
 
3.5%
wagon 2852
 
3.5%
truck 844
 
1.0%
Other values (107) 5159
 
6.3%
2024-12-04T13:01:04.484294image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
48693
 
8.0%
S 48374
 
8.0%
t 47762
 
7.9%
i 39190
 
6.5%
a 31788
 
5.2%
e 31578
 
5.2%
n 31253
 
5.1%
o 29021
 
4.8%
E 24673
 
4.1%
l 19262
 
3.2%
Other values (48) 256000
42.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 607594
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
48693
 
8.0%
S 48374
 
8.0%
t 47762
 
7.9%
i 39190
 
6.5%
a 31788
 
5.2%
e 31578
 
5.2%
n 31253
 
5.1%
o 29021
 
4.8%
E 24673
 
4.1%
l 19262
 
3.2%
Other values (48) 256000
42.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 607594
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
48693
 
8.0%
S 48374
 
8.0%
t 47762
 
7.9%
i 39190
 
6.5%
a 31788
 
5.2%
e 31578
 
5.2%
n 31253
 
5.1%
o 29021
 
4.8%
E 24673
 
4.1%
l 19262
 
3.2%
Other values (48) 256000
42.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 607594
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
48693
 
8.0%
S 48374
 
8.0%
t 47762
 
7.9%
i 39190
 
6.5%
a 31788
 
5.2%
e 31578
 
5.2%
n 31253
 
5.1%
o 29021
 
4.8%
E 24673
 
4.1%
l 19262
 
3.2%
Other values (48) 256000
42.1%

VEHICLE TYPE CODE 5
Text

Missing 

Distinct73
Distinct (%)0.8%
Missing2129794
Missing (%)99.6%
Memory size16.3 MiB
2024-12-04T13:01:04.686296image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Length

Max length35
Median length30
Mean length18.154636
Min length2

Characters and Unicode

Total characters168003
Distinct characters55
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)0.4%

Sample

1st rowStation Wagon/Sport Utility Vehicle
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowSedan
5th rowStation Wagon/Sport Utility Vehicle
ValueCountFrequency (%)
vehicle 4200
18.5%
station 3506
15.4%
utility 3506
15.4%
sedan 3427
15.1%
wagon/sport 2704
11.9%
passenger 1487
 
6.5%
804
 
3.5%
wagon 804
 
3.5%
sport 802
 
3.5%
truck 261
 
1.1%
Other values (72) 1236
 
5.4%
2024-12-04T13:01:05.035296image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 13597
 
8.1%
13493
 
8.0%
S 13331
 
7.9%
i 11152
 
6.6%
a 9025
 
5.4%
e 8974
 
5.3%
n 8898
 
5.3%
o 8279
 
4.9%
E 6130
 
3.6%
l 5482
 
3.3%
Other values (45) 69642
41.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 168003
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 13597
 
8.1%
13493
 
8.0%
S 13331
 
7.9%
i 11152
 
6.6%
a 9025
 
5.4%
e 8974
 
5.3%
n 8898
 
5.3%
o 8279
 
4.9%
E 6130
 
3.6%
l 5482
 
3.3%
Other values (45) 69642
41.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 168003
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 13597
 
8.1%
13493
 
8.0%
S 13331
 
7.9%
i 11152
 
6.6%
a 9025
 
5.4%
e 8974
 
5.3%
n 8898
 
5.3%
o 8279
 
4.9%
E 6130
 
3.6%
l 5482
 
3.3%
Other values (45) 69642
41.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 168003
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 13597
 
8.1%
13493
 
8.0%
S 13331
 
7.9%
i 11152
 
6.6%
a 9025
 
5.4%
e 8974
 
5.3%
n 8898
 
5.3%
o 8279
 
4.9%
E 6130
 
3.6%
l 5482
 
3.3%
Other values (45) 69642
41.5%

Interactions

2024-12-04T13:00:23.423845image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:50.910930image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:55.358600image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:59.827490image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:04.539777image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:09.150026image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:13.720850image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:18.556940image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:23.982476image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:51.439927image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:55.898563image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:00.361340image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:05.054042image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:09.696026image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:14.260542image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:19.111952image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:24.642463image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:52.016927image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:56.479847image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:00.996349image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:05.635270image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:10.327026image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:14.892912image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:19.751004image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:25.308473image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:52.547927image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:57.040738image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:01.599340image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:06.203271image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:10.902735image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:15.513916image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:20.389004image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:25.919514image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:53.147927image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:57.586744image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:02.236349image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:06.771272image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:11.470736image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:16.116940image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:21.012249image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:26.505357image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:53.695925image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:58.160742image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:02.806549image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:07.393167image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:12.044448image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:16.669940image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:21.639243image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:27.093423image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:54.264581image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:58.715740image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:03.385351image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:08.005032image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:12.609853image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:17.417940image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:22.263844image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:27.651423image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:54.814578image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T12:59:59.294763image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:04.012403image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:08.590025image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:13.188850image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:18.008940image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
2024-12-04T13:00:22.861862image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/

Correlations

2024-12-04T13:01:05.134295image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
BOROUGHCOLLISION_IDCONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5LATITUDELONGITUDENUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLED
BOROUGH1.0000.0550.0520.0470.0060.0060.0280.0010.0080.0040.0020.0000.0080.002
COLLISION_ID0.0551.0000.0640.076-0.0140.0680.0400.0040.1140.0090.0330.0040.1470.011
CONTRIBUTING FACTOR VEHICLE 40.0520.0641.0000.6940.0000.0000.0000.0000.0240.0000.1430.0000.0260.000
CONTRIBUTING FACTOR VEHICLE 50.0470.0760.6941.0000.0000.0000.0000.0000.0400.0000.0000.0000.0380.000
LATITUDE0.006-0.0140.0000.0001.0000.2850.0020.000-0.032-0.0010.0030.000-0.026-0.001
LONGITUDE0.0060.0680.0000.0000.2851.0000.0020.0000.0750.006-0.0140.0000.0390.003
NUMBER OF CYCLIST INJURED0.0280.0400.0000.0000.0020.0021.0000.0180.0040.0010.0000.0020.0040.005
NUMBER OF CYCLIST KILLED0.0010.0040.0000.0000.0000.0000.0181.0000.0000.0000.1670.7070.0400.736
NUMBER OF MOTORIST INJURED0.0080.1140.0240.040-0.0320.0750.0040.0001.0000.018-0.0900.0080.7820.008
NUMBER OF MOTORIST KILLED0.0040.0090.0000.000-0.0010.0060.0010.0000.0181.000-0.0040.0170.0120.627
NUMBER OF PEDESTRIANS INJURED0.0020.0330.1430.0000.003-0.0140.0000.167-0.090-0.0041.0000.1690.412-0.002
NUMBER OF PEDESTRIANS KILLED0.0000.0040.0000.0000.0000.0000.0020.7070.0080.0170.1691.0000.0360.693
NUMBER OF PERSONS INJURED0.0080.1470.0260.038-0.0260.0390.0040.0400.7820.0120.4120.0361.0000.003
NUMBER OF PERSONS KILLED0.0020.0110.0000.000-0.0010.0030.0050.7360.0080.627-0.0020.6930.0031.000

Missing values

2024-12-04T13:00:28.958049image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-12-04T13:00:33.928485image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-12-04T13:00:47.081378image/svg+xmlMatplotlib v3.9.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
009/11/20212:39NaNNaNNaNNaNNaNWHITESTONE EXPRESSWAY20 AVENUENaN2.00.0000020Aggressive Driving/Road RageUnspecifiedNaNNaNNaN4455765SedanSedanNaNNaNNaN
103/26/202211:45NaNNaNNaNNaNNaNQUEENSBORO BRIDGE UPPERNaNNaN1.00.0000010Pavement SlipperyNaNNaNNaNNaN4513547SedanNaNNaNNaNNaN
206/29/20226:55NaNNaNNaNNaNNaNTHROGS NECK BRIDGENaNNaN0.00.0000000Following Too CloselyUnspecifiedNaNNaNNaN4541903SedanPick-up TruckNaNNaNNaN
309/11/20219:35BROOKLYN11208.040.667202-73.866500(40.667202, -73.8665)NaNNaN1211 LORING AVENUE0.00.0000000UnspecifiedNaNNaNNaNNaN4456314SedanNaNNaNNaNNaN
412/14/20218:13BROOKLYN11233.040.683304-73.917274(40.683304, -73.917274)SARATOGA AVENUEDECATUR STREETNaN0.00.0000000NaNNaNNaNNaNNaN4486609NaNNaNNaNNaNNaN
504/14/202112:47NaNNaNNaNNaNNaNMAJOR DEEGAN EXPRESSWAY RAMPNaNNaN0.00.0000000UnspecifiedUnspecifiedNaNNaNNaN4407458DumpSedanNaNNaNNaN
612/14/202117:05NaNNaN40.709183-73.956825(40.709183, -73.956825)BROOKLYN QUEENS EXPRESSWAYNaNNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486555SedanTractor Truck DieselNaNNaNNaN
712/14/20218:17BRONX10475.040.868160-73.831480(40.86816, -73.83148)NaNNaN344 BAYCHESTER AVENUE2.00.0000020UnspecifiedUnspecifiedNaNNaNNaN4486660SedanSedanNaNNaNNaN
812/14/202121:10BROOKLYN11207.040.671720-73.897100(40.67172, -73.8971)NaNNaN2047 PITKIN AVENUE0.00.0000000Driver InexperienceUnspecifiedNaNNaNNaN4487074SedanNaNNaNNaNNaN
912/14/202114:58MANHATTAN10017.040.751440-73.973970(40.75144, -73.97397)3 AVENUEEAST 43 STREETNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4486519SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
CRASH DATECRASH TIMEBOROUGHZIP CODELATITUDELONGITUDELOCATIONON STREET NAMECROSS STREET NAMEOFF STREET NAMENUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1CONTRIBUTING FACTOR VEHICLE 2CONTRIBUTING FACTOR VEHICLE 3CONTRIBUTING FACTOR VEHICLE 4CONTRIBUTING FACTOR VEHICLE 5COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2VEHICLE TYPE CODE 3VEHICLE TYPE CODE 4VEHICLE TYPE CODE 5
213903811/30/202420:40BROOKLYN11211.040.713100-73.957466(40.7131, -73.957466)NaNNaN291 GRAND ST0.00.0000000Failure to Yield Right-of-WayUnspecifiedNaNNaNNaN4775771SedanStation Wagon/Sport Utility VehicleNaNNaNNaN
213903911/30/202423:00BROOKLYN11231.040.675102-74.001686(40.675102, -74.001686)CLINTON STMILL STNaN0.00.0000000Passing Too CloselyUnspecifiedNaNNaNNaN4775359SedanNaNNaNNaNNaN
213904011/30/20249:58BROOKLYN11208.040.678535-73.875230(40.678535, -73.87523)NaNNaN1 WELLS ST0.00.0000000Other VehicularOther VehicularBrakes DefectiveNaNNaN4775336Station Wagon/Sport Utility VehicleSedanPick-up TruckNaNNaN
213904111/26/202420:49MANHATTAN10031.040.823940-73.948555(40.82394, -73.948555)W 143 STAMSTERDAM AVENaN1.00.0000010Driver Inattention/DistractionUnspecifiedNaNNaNNaN4775721TaxiStation Wagon/Sport Utility VehicleNaNNaNNaN
213904211/30/202415:25NaNNaN40.704494-73.817430(40.704494, -73.81743)VAN WYCK EXPWYNaNNaN0.00.0000000UnspecifiedUnspecifiedNaNNaNNaN4775503Station Wagon/Sport Utility VehicleSedanNaNNaNNaN
213904311/30/202421:30QUEENS11373.040.742190-73.869545(40.74219, -73.869545)NaNNaN94-18 CORONA AVE0.00.0000000Driver Inattention/DistractionUnspecifiedNaNNaNNaN4775566Station Wagon/Sport Utility VehicleBikeNaNNaNNaN
213904411/26/202412:55MANHATTAN10025.040.803566-73.967140(40.803566, -73.96714)W 109 STBROADWAYNaN0.00.0000000Failure to Yield Right-of-WayUnspecifiedNaNNaNNaN4775621BikeStation Wagon/Sport Utility VehicleNaNNaNNaN
213904511/30/20240:36NaNNaN40.666435-73.834780(40.666435, -73.83478)BELT PARKWAYNaNNaN0.00.0000000Driver Inattention/DistractionFollowing Too CloselyNaNNaNNaN4775484SedanSedanNaNNaNNaN
213904611/29/202412:14QUEENS11373.040.741160-73.882706(40.74116, -73.882706)NaNNaN45-11 82 ST0.00.0000000Driver Inattention/DistractionUnspecifiedNaNNaNNaN4775820SedanBox TruckNaNNaNNaN
213904711/30/20244:42QUEENS11435.040.698463-73.808205(40.698463, -73.808205)NaNNaN144-06 94 AVE1.00.0000010Driver Inattention/DistractionUnspecifiedNaNNaNNaN4775537SedanSedanNaNNaNNaN